Overview

Dataset statistics

Number of variables15
Number of observations2125
Missing cells0
Missing cells (%)0.0%
Duplicate rows12
Duplicate rows (%)0.6%
Total size in memory249.1 KiB
Average record size in memory120.1 B

Variable types

Numeric10
Categorical5

Warnings

Dataset has 12 (0.6%) duplicate rowsDuplicates
name has a high cardinality: 451 distinct values High cardinality
host_name has a high cardinality: 201 distinct values High cardinality
last_review has a high cardinality: 308 distinct values High cardinality
id is highly correlated with host_id and 1 other fieldsHigh correlation
host_id is highly correlated with idHigh correlation
latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
number_of_reviews is highly correlated with id and 1 other fieldsHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
id is highly correlated with host_id and 1 other fieldsHigh correlation
host_id is highly correlated with idHigh correlation
latitude is highly correlated with longitudeHigh correlation
longitude is highly correlated with latitudeHigh correlation
number_of_reviews is highly correlated with id and 1 other fieldsHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
number_of_reviews is highly correlated with reviews_per_monthHigh correlation
reviews_per_month is highly correlated with number_of_reviewsHigh correlation
longitude is highly correlated with neighbourhood and 4 other fieldsHigh correlation
price is highly correlated with room_typeHigh correlation
number_of_reviews is highly correlated with latitude and 2 other fieldsHigh correlation
neighbourhood is highly correlated with longitude and 4 other fieldsHigh correlation
latitude is highly correlated with longitude and 5 other fieldsHigh correlation
reviews_per_month is highly correlated with number_of_reviews and 3 other fieldsHigh correlation
calculated_host_listings_count is highly correlated with longitude and 2 other fieldsHigh correlation
room_type is highly correlated with priceHigh correlation
id is highly correlated with longitude and 5 other fieldsHigh correlation
host_id is highly correlated with longitude and 5 other fieldsHigh correlation
availability_365 has 228 (10.7%) zeros Zeros

Reproduction

Analysis started2021-06-22 22:02:48.723691
Analysis finished2021-06-22 22:03:55.465594
Duration1 minute and 6.74 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct431
Distinct (%)20.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23491517.6
Minimum8521
Maximum48157277
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:03:55.745111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum8521
5-th percentile1799157
Q113169945
median22888033
Q333719501
95-th percentile45763073.4
Maximum48157277
Range48148756
Interquartile range (IQR)20549556

Descriptive statistics

Standard deviation13795942.02
Coefficient of variation (CV)0.5872733407
Kurtosis-1.092730333
Mean23491517.6
Median Absolute Deviation (MAD)10398951
Skewness0.06111615073
Sum4.99194749 × 1010
Variance1.903280161 × 1014
MonotonicityNot monotonic
2021-06-22T18:03:56.213905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
123184509
 
0.4%
12258319
 
0.4%
193462429
 
0.4%
332457469
 
0.4%
346640779
 
0.4%
349446499
 
0.4%
329869579
 
0.4%
25389839
 
0.4%
172857499
 
0.4%
61855449
 
0.4%
Other values (421)2035
95.8%
ValueCountFrequency (%)
85217
0.3%
797626
0.3%
1088986
0.3%
4564296
0.3%
5773845
0.2%
7155321
 
< 0.1%
7425747
0.3%
11402014
0.2%
11410886
0.3%
11542986
0.3%
ValueCountFrequency (%)
481572771
 
< 0.1%
481085942
0.1%
481074601
 
< 0.1%
477036911
 
< 0.1%
475381013
0.1%
472967473
0.1%
472235232
0.1%
469388183
0.1%
469036451
 
< 0.1%
468200932
0.1%

name
Categorical

HIGH CARDINALITY

Distinct451
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
3 Bed 2 Bath Tourist House near Kendall Square
 
12
CLOSE TO HARVARD&MIT
 
10
keyless private room near MIT,Central Sq 2
 
9
City Oasis |Deck & Yard |Walk To Harvard MIT Train
 
9
Harvard MIT: Artist's Home
 
9
Other values (446)
2076 

Length

Max length70
Median length42
Mean length40.36705882
Min length14

Characters and Unicode

Total characters85780
Distinct characters88
Distinct categories13 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)2.8%

Sample

1st rowMiddle Room in Shared Apt
2nd rowLarge Downstairs Room
3rd rowVictorian Charm MIT/Harvard/Kendall/Central-1BR
4th rowHarvard and MIT - Enjoy Comfort and Convenience!
5th rowCharming Harvard Victorian

Common Values

ValueCountFrequency (%)
3 Bed 2 Bath Tourist House near Kendall Square12
 
0.6%
CLOSE TO HARVARD&MIT10
 
0.5%
keyless private room near MIT,Central Sq 29
 
0.4%
City Oasis |Deck & Yard |Walk To Harvard MIT Train9
 
0.4%
Harvard MIT: Artist's Home9
 
0.4%
I3 Private Room by Kendall/MIT/Central Statio9
 
0.4%
Hey Private Room close to MIT and Harvard Uni9
 
0.4%
Fabulous Flat Near Harvard Square9
 
0.4%
Convenient Studio *parking* 3-min. walk to subway9
 
0.4%
Luxury studio w/ parking by MIT/Harvard/BU/Fenway9
 
0.4%
Other values (441)2031
95.6%

Length

2021-06-22T18:03:57.298406image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
harvard714
 
5.2%
513
 
3.7%
room482
 
3.5%
to469
 
3.4%
in406
 
2.9%
cambridge396
 
2.9%
mit377
 
2.7%
private371
 
2.7%
near362
 
2.6%
square307
 
2.2%
Other values (588)9461
68.3%

Most occurring characters

ValueCountFrequency (%)
11867
 
13.8%
a6542
 
7.6%
r6057
 
7.1%
e5459
 
6.4%
o4158
 
4.8%
t3884
 
4.5%
n3282
 
3.8%
i3260
 
3.8%
d2676
 
3.1%
l1886
 
2.2%
Other values (78)36709
42.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter52145
60.8%
Uppercase Letter16706
 
19.5%
Space Separator11867
 
13.8%
Other Punctuation2681
 
3.1%
Decimal Number1394
 
1.6%
Dash Punctuation583
 
0.7%
Math Symbol130
 
0.2%
Other Letter86
 
0.1%
Open Punctuation84
 
0.1%
Close Punctuation84
 
0.1%
Other values (3)20
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a6542
12.5%
r6057
11.6%
e5459
10.5%
o4158
 
8.0%
t3884
 
7.4%
n3282
 
6.3%
i3260
 
6.3%
d2676
 
5.1%
l1886
 
3.6%
m1770
 
3.4%
Other values (16)13171
25.3%
Uppercase Letter
ValueCountFrequency (%)
H1503
 
9.0%
T1448
 
8.7%
R1335
 
8.0%
C1290
 
7.7%
M1252
 
7.5%
S1239
 
7.4%
I1165
 
7.0%
B1161
 
6.9%
A1019
 
6.1%
P671
 
4.0%
Other values (15)4623
27.7%
Other Punctuation
ValueCountFrequency (%)
/998
37.2%
,693
25.8%
&328
 
12.2%
.200
 
7.5%
!158
 
5.9%
*91
 
3.4%
#56
 
2.1%
:53
 
2.0%
'44
 
1.6%
@42
 
1.6%
Other values (2)18
 
0.7%
Decimal Number
ValueCountFrequency (%)
2429
30.8%
1428
30.7%
3297
21.3%
4106
 
7.6%
553
 
3.8%
025
 
1.8%
921
 
1.5%
620
 
1.4%
79
 
0.6%
86
 
0.4%
Other Letter
ValueCountFrequency (%)
16
18.6%
16
18.6%
16
18.6%
16
18.6%
16
18.6%
6
 
7.0%
Math Symbol
ValueCountFrequency (%)
+78
60.0%
|52
40.0%
Space Separator
ValueCountFrequency (%)
11867
100.0%
Dash Punctuation
ValueCountFrequency (%)
-583
100.0%
Open Punctuation
ValueCountFrequency (%)
(84
100.0%
Close Punctuation
ValueCountFrequency (%)
)84
100.0%
Modifier Symbol
ValueCountFrequency (%)
^8
100.0%
Final Punctuation
ValueCountFrequency (%)
6
100.0%
Other Symbol
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin68851
80.3%
Common16843
 
19.6%
Han86
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a6542
 
9.5%
r6057
 
8.8%
e5459
 
7.9%
o4158
 
6.0%
t3884
 
5.6%
n3282
 
4.8%
i3260
 
4.7%
d2676
 
3.9%
l1886
 
2.7%
m1770
 
2.6%
Other values (41)29877
43.4%
Common
ValueCountFrequency (%)
11867
70.5%
/998
 
5.9%
,693
 
4.1%
-583
 
3.5%
2429
 
2.5%
1428
 
2.5%
&328
 
1.9%
3297
 
1.8%
.200
 
1.2%
!158
 
0.9%
Other values (21)862
 
5.1%
Han
ValueCountFrequency (%)
16
18.6%
16
18.6%
16
18.6%
16
18.6%
16
18.6%
6
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII85676
99.9%
CJK86
 
0.1%
Punctuation6
 
< 0.1%
Dingbats6
 
< 0.1%
None6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11867
 
13.9%
a6542
 
7.6%
r6057
 
7.1%
e5459
 
6.4%
o4158
 
4.9%
t3884
 
4.5%
n3282
 
3.8%
i3260
 
3.8%
d2676
 
3.1%
l1886
 
2.2%
Other values (69)36605
42.7%
CJK
ValueCountFrequency (%)
16
18.6%
16
18.6%
16
18.6%
16
18.6%
16
18.6%
6
 
7.0%
Punctuation
ValueCountFrequency (%)
6
100.0%
Dingbats
ValueCountFrequency (%)
6
100.0%
None
ValueCountFrequency (%)
6
100.0%

host_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct223
Distinct (%)10.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91269292.18
Minimum35384
Maximum379297950
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:03:57.750611image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum35384
5-th percentile430015
Q112576232
median43450256
Q3137754684
95-th percentile347407976
Maximum379297950
Range379262566
Interquartile range (IQR)125178452

Descriptive statistics

Standard deviation106605849.1
Coefficient of variation (CV)1.16803633
Kurtosis0.6280525953
Mean91269292.18
Median Absolute Deviation (MAD)38872190
Skewness1.310243033
Sum1.939472459 × 1011
Variance1.136480706 × 1016
MonotonicityNot monotonic
2021-06-22T18:03:58.228114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1515468787
 
4.1%
4345025686
 
4.0%
37367513757
 
2.7%
2163188954
 
2.5%
8103838
 
1.8%
34740797636
 
1.7%
6660441635
 
1.6%
2174523035
 
1.6%
11967280034
 
1.6%
9350322127
 
1.3%
Other values (213)1636
77.0%
ValueCountFrequency (%)
353846
 
0.3%
8103838
1.8%
938616
 
0.3%
22995625
1.2%
30668113
 
0.6%
3933043
 
0.1%
4043606
 
0.3%
4053416
 
0.3%
43001512
 
0.6%
8862261
 
< 0.1%
ValueCountFrequency (%)
3792979503
 
0.1%
3775416525
 
0.2%
3746639921
 
< 0.1%
3740600725
 
0.2%
37367513757
2.7%
3695600407
 
0.3%
3653113087
 
0.3%
36458583518
 
0.8%
3515552341
 
< 0.1%
34740797636
1.7%

host_name
Categorical

HIGH CARDINALITY

Distinct201
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
John
 
124
Steve
 
95
Liya
 
57
Ling Yi
 
54
Louisa
 
38
Other values (196)
1757 

Length

Max length27
Median length5
Mean length5.976941176
Min length1

Characters and Unicode

Total characters12701
Distinct characters58
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.7%

Sample

1st rowAdam
2nd rowAdam
3rd rowPaul
4th rowKyle
5th rowSteve

Common Values

ValueCountFrequency (%)
John124
 
5.8%
Steve95
 
4.5%
Liya57
 
2.7%
Ling Yi54
 
2.5%
Louisa38
 
1.8%
Alexander36
 
1.7%
Jurek35
 
1.6%
Toby & Quinn35
 
1.6%
Charlie34
 
1.6%
Mark33
 
1.6%
Other values (191)1584
74.5%

Length

2021-06-22T18:03:59.305447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john124
 
4.9%
110
 
4.4%
steve102
 
4.0%
liya57
 
2.3%
yi54
 
2.1%
ling54
 
2.1%
louisa38
 
1.5%
alexander36
 
1.4%
toby35
 
1.4%
jurek35
 
1.4%
Other values (207)1880
74.5%

Most occurring characters

ValueCountFrequency (%)
a1373
 
10.8%
e1343
 
10.6%
n1188
 
9.4%
i951
 
7.5%
r617
 
4.9%
l568
 
4.5%
o526
 
4.1%
J404
 
3.2%
404
 
3.2%
h374
 
2.9%
Other values (48)4953
39.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9696
76.3%
Uppercase Letter2446
 
19.3%
Space Separator404
 
3.2%
Other Punctuation147
 
1.2%
Open Punctuation4
 
< 0.1%
Close Punctuation4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1373
14.2%
e1343
13.9%
n1188
12.3%
i951
9.8%
r617
 
6.4%
l568
 
5.9%
o526
 
5.4%
h374
 
3.9%
u354
 
3.7%
y326
 
3.4%
Other values (17)2076
21.4%
Uppercase Letter
ValueCountFrequency (%)
J404
16.5%
A250
10.2%
L232
9.5%
M228
9.3%
S165
 
6.7%
C143
 
5.8%
D131
 
5.4%
R122
 
5.0%
K115
 
4.7%
G104
 
4.3%
Other values (16)552
22.6%
Other Punctuation
ValueCountFrequency (%)
&137
93.2%
/10
 
6.8%
Space Separator
ValueCountFrequency (%)
404
100.0%
Open Punctuation
ValueCountFrequency (%)
(4
100.0%
Close Punctuation
ValueCountFrequency (%)
)4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12142
95.6%
Common559
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1373
 
11.3%
e1343
 
11.1%
n1188
 
9.8%
i951
 
7.8%
r617
 
5.1%
l568
 
4.7%
o526
 
4.3%
J404
 
3.3%
h374
 
3.1%
u354
 
2.9%
Other values (43)4444
36.6%
Common
ValueCountFrequency (%)
404
72.3%
&137
 
24.5%
/10
 
1.8%
(4
 
0.7%
)4
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII12700
> 99.9%
Latin 1 Sup1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1373
 
10.8%
e1343
 
10.6%
n1188
 
9.4%
i951
 
7.5%
r617
 
4.9%
l568
 
4.5%
o526
 
4.1%
J404
 
3.2%
404
 
3.2%
h374
 
2.9%
Other values (47)4952
39.0%
Latin 1 Sup
ValueCountFrequency (%)
ó1
100.0%

neighbourhood
Categorical

HIGH CORRELATION

Distinct13
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
Cambridgeport
327 
Mid-Cambridge
302 
The Port
272 
North Cambridge
226 
Wellington-Harrington
213 
Other values (8)
785 

Length

Max length21
Median length13
Mean length13.44188235
Min length7

Characters and Unicode

Total characters28564
Distinct characters35
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThe Port
2nd rowThe Port
3rd rowThe Port
4th rowCambridgeport
5th rowWest Cambridge

Common Values

ValueCountFrequency (%)
Cambridgeport327
15.4%
Mid-Cambridge302
14.2%
The Port272
12.8%
North Cambridge226
10.6%
Wellington-Harrington213
10.0%
East Cambridge185
8.7%
West Cambridge155
7.3%
Neighborhood Nine141
6.6%
Riverside113
 
5.3%
Strawberry Hill81
 
3.8%
Other values (3)110
 
5.2%

Length

2021-06-22T18:04:00.167000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cambridge572
17.7%
cambridgeport327
10.1%
mid-cambridge302
9.3%
the272
8.4%
port272
8.4%
north226
 
7.0%
wellington-harrington213
 
6.6%
east185
 
5.7%
west155
 
4.8%
nine141
 
4.4%
Other values (8)565
17.5%

Most occurring characters

ValueCountFrequency (%)
r2988
 
10.5%
i2589
 
9.1%
e2469
 
8.6%
g1839
 
6.4%
a1790
 
6.3%
d1763
 
6.2%
o1674
 
5.9%
t1672
 
5.9%
b1423
 
5.0%
C1201
 
4.2%
Other values (25)9156
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23043
80.7%
Uppercase Letter3823
 
13.4%
Space Separator1105
 
3.9%
Dash Punctuation515
 
1.8%
Decimal Number39
 
0.1%
Other Punctuation39
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r2988
13.0%
i2589
11.2%
e2469
10.7%
g1839
8.0%
a1790
7.8%
d1763
7.7%
o1674
7.3%
t1672
7.3%
b1423
 
6.2%
m1201
 
5.2%
Other values (9)3635
15.8%
Uppercase Letter
ValueCountFrequency (%)
C1201
31.4%
N508
13.3%
W368
 
9.6%
M341
 
8.9%
T311
 
8.1%
H300
 
7.8%
P272
 
7.1%
E185
 
4.8%
R113
 
3.0%
A104
 
2.7%
Other values (2)120
 
3.1%
Space Separator
ValueCountFrequency (%)
1105
100.0%
Dash Punctuation
ValueCountFrequency (%)
-515
100.0%
Decimal Number
ValueCountFrequency (%)
239
100.0%
Other Punctuation
ValueCountFrequency (%)
/39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin26866
94.1%
Common1698
 
5.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r2988
11.1%
i2589
 
9.6%
e2469
 
9.2%
g1839
 
6.8%
a1790
 
6.7%
d1763
 
6.6%
o1674
 
6.2%
t1672
 
6.2%
b1423
 
5.3%
C1201
 
4.5%
Other values (21)7458
27.8%
Common
ValueCountFrequency (%)
1105
65.1%
-515
30.3%
239
 
2.3%
/39
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII28564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r2988
 
10.5%
i2589
 
9.1%
e2469
 
8.6%
g1839
 
6.4%
a1790
 
6.3%
d1763
 
6.2%
o1674
 
5.9%
t1672
 
5.9%
b1423
 
5.0%
C1201
 
4.2%
Other values (25)9156
32.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct469
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.37307034
Minimum42.35564
Maximum42.40021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:00.596550image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum42.35564
5-th percentile42.35889
Q142.36608
median42.37089
Q342.37778
95-th percentile42.39384
Maximum42.40021
Range0.04457
Interquartile range (IQR)0.0117

Descriptive statistics

Standard deviation0.01010154402
Coefficient of variation (CV)0.0002383953757
Kurtosis-0.2066345378
Mean42.37307034
Median Absolute Deviation (MAD)0.00528
Skewness0.7177496906
Sum90042.77448
Variance0.0001020411916
MonotonicityNot monotonic
2021-06-22T18:04:01.008242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42.367732
 
1.5%
42.369518
 
0.8%
42.3696212
 
0.6%
42.3683212
 
0.6%
42.3946712
 
0.6%
42.3731512
 
0.6%
42.3716812
 
0.6%
42.3876812
 
0.6%
42.3811612
 
0.6%
42.3596911
 
0.5%
Other values (459)1980
93.2%
ValueCountFrequency (%)
42.355649
0.4%
42.355896
0.3%
42.356671
 
< 0.1%
42.356928
0.4%
42.356986
0.3%
42.357049
0.4%
42.35746
0.3%
42.357472
 
0.1%
42.357566
0.3%
42.357683
 
0.1%
ValueCountFrequency (%)
42.400215
0.2%
42.398921
 
< 0.1%
42.398893
 
0.1%
42.398841
 
< 0.1%
42.398051
 
< 0.1%
42.397645
0.2%
42.396348
0.4%
42.396339
0.4%
42.396124
0.2%
42.396047
0.3%

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct485
Distinct (%)22.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-71.11003003
Minimum-71.15592
Maximum-71.06636
Zeros0
Zeros (%)0.0%
Negative2125
Negative (%)100.0%
Memory size16.7 KiB
2021-06-22T18:04:01.521215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-71.15592
5-th percentile-71.14064
Q1-71.12288
median-71.10793
Q3-71.09768
95-th percentile-71.08283
Maximum-71.06636
Range0.08956
Interquartile range (IQR)0.0252

Descriptive statistics

Standard deviation0.01768778101
Coefficient of variation (CV)-0.000248738202
Kurtosis-0.2239837659
Mean-71.11003003
Median Absolute Deviation (MAD)0.011
Skewness-0.430160778
Sum-151108.8138
Variance0.0003128575972
MonotonicityNot monotonic
2021-06-22T18:04:01.955760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-71.1058432
 
1.5%
-71.1098314
 
0.7%
-71.1033813
 
0.6%
-71.098912
 
0.6%
-71.0992112
 
0.6%
-71.124812
 
0.6%
-71.1332611
 
0.5%
-71.1327611
 
0.5%
-71.1049310
 
0.5%
-71.106610
 
0.5%
Other values (475)1988
93.6%
ValueCountFrequency (%)
-71.155926
0.3%
-71.154782
 
0.1%
-71.154482
 
0.1%
-71.15446
0.3%
-71.154051
 
< 0.1%
-71.154011
 
< 0.1%
-71.153548
0.4%
-71.15351
 
< 0.1%
-71.153026
0.3%
-71.152784
0.2%
ValueCountFrequency (%)
-71.066361
 
< 0.1%
-71.07176
0.3%
-71.072261
 
< 0.1%
-71.072652
 
0.1%
-71.073733
0.1%
-71.077075
0.2%
-71.077266
0.3%
-71.077876
0.3%
-71.078494
0.2%
-71.078616
0.3%

room_type
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
Entire home/apt
1199 
Private room
926 

Length

Max length15
Median length15
Mean length13.69270588
Min length12

Characters and Unicode

Total characters29097
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrivate room
2nd rowPrivate room
3rd rowEntire home/apt
4th rowEntire home/apt
5th rowEntire home/apt

Common Values

ValueCountFrequency (%)
Entire home/apt1199
56.4%
Private room926
43.6%

Length

2021-06-22T18:04:02.742224image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-06-22T18:04:03.002557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
entire1199
28.2%
home/apt1199
28.2%
room926
21.8%
private926
21.8%

Most occurring characters

ValueCountFrequency (%)
t3324
11.4%
e3324
11.4%
r3051
10.5%
o3051
10.5%
i2125
 
7.3%
a2125
 
7.3%
2125
 
7.3%
m2125
 
7.3%
E1199
 
4.1%
n1199
 
4.1%
Other values (5)5449
18.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter23648
81.3%
Uppercase Letter2125
 
7.3%
Space Separator2125
 
7.3%
Other Punctuation1199
 
4.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t3324
14.1%
e3324
14.1%
r3051
12.9%
o3051
12.9%
i2125
9.0%
a2125
9.0%
m2125
9.0%
n1199
 
5.1%
h1199
 
5.1%
p1199
 
5.1%
Uppercase Letter
ValueCountFrequency (%)
E1199
56.4%
P926
43.6%
Space Separator
ValueCountFrequency (%)
2125
100.0%
Other Punctuation
ValueCountFrequency (%)
/1199
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin25773
88.6%
Common3324
 
11.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t3324
12.9%
e3324
12.9%
r3051
11.8%
o3051
11.8%
i2125
8.2%
a2125
8.2%
m2125
8.2%
E1199
 
4.7%
n1199
 
4.7%
h1199
 
4.7%
Other values (3)3051
11.8%
Common
ValueCountFrequency (%)
2125
63.9%
/1199
36.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII29097
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t3324
11.4%
e3324
11.4%
r3051
10.5%
o3051
10.5%
i2125
 
7.3%
a2125
 
7.3%
2125
 
7.3%
m2125
 
7.3%
E1199
 
4.1%
n1199
 
4.1%
Other values (5)5449
18.7%

price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct267
Distinct (%)12.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.9642353
Minimum19
Maximum950
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:03.341551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile35
Q160
median103
Q3162
95-th percentile300
Maximum950
Range931
Interquartile range (IQR)102

Descriptive statistics

Standard deviation97.03084353
Coefficient of variation (CV)0.7582653333
Kurtosis14.81399519
Mean127.9642353
Median Absolute Deviation (MAD)47
Skewness2.798750012
Sum271924
Variance9414.984596
MonotonicityNot monotonic
2021-06-22T18:04:03.746706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6064
 
3.0%
6564
 
3.0%
9947
 
2.2%
11043
 
2.0%
15039
 
1.8%
25037
 
1.7%
5537
 
1.7%
10036
 
1.7%
7033
 
1.6%
12533
 
1.6%
Other values (257)1692
79.6%
ValueCountFrequency (%)
191
 
< 0.1%
232
 
0.1%
259
 
0.4%
273
 
0.1%
283
 
0.1%
2928
1.3%
3012
0.6%
314
 
0.2%
328
 
0.4%
3315
0.7%
ValueCountFrequency (%)
9504
0.2%
9002
 
0.1%
6501
 
< 0.1%
5511
 
< 0.1%
5381
 
< 0.1%
5091
 
< 0.1%
5071
 
< 0.1%
5006
0.3%
4851
 
< 0.1%
4756
0.3%

minimum_nights
Real number (ℝ≥0)

Distinct41
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.39670588
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:04.198003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q39
95-th percentile60
Maximum365
Range364
Interquartile range (IQR)8

Descriptive statistics

Standard deviation27.62080619
Coefficient of variation (CV)2.228076269
Kurtosis55.31757426
Mean12.39670588
Median Absolute Deviation (MAD)1
Skewness5.980811417
Sum26343
Variance762.9089345
MonotonicityNot monotonic
2021-06-22T18:04:04.635529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
1760
35.8%
2491
23.1%
3207
 
9.7%
30165
 
7.8%
3273
 
3.4%
465
 
3.1%
2845
 
2.1%
535
 
1.6%
3127
 
1.3%
6026
 
1.2%
Other values (31)231
 
10.9%
ValueCountFrequency (%)
1760
35.8%
2491
23.1%
3207
 
9.7%
465
 
3.1%
535
 
1.6%
68
 
0.4%
725
 
1.2%
96
 
0.3%
107
 
0.3%
111
 
< 0.1%
ValueCountFrequency (%)
3653
 
0.1%
3002
 
0.1%
2003
 
0.1%
1806
 
0.3%
1452
 
0.1%
1104
 
0.2%
1008
 
0.4%
951
 
< 0.1%
9122
1.0%
9011
0.5%

number_of_reviews
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct296
Distinct (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88.55576471
Minimum1
Maximum588
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:05.093750image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q112
median54
Q3139
95-th percentile280
Maximum588
Range587
Interquartile range (IQR)127

Descriptive statistics

Standard deviation97.91212729
Coefficient of variation (CV)1.105655037
Kurtosis3.32078882
Mean88.55576471
Median Absolute Deviation (MAD)50
Skewness1.672521384
Sum188181
Variance9586.784671
MonotonicityNot monotonic
2021-06-22T18:04:05.554926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1171
 
8.0%
285
 
4.0%
361
 
2.9%
446
 
2.2%
933
 
1.6%
3032
 
1.5%
2031
 
1.5%
1029
 
1.4%
1826
 
1.2%
524
 
1.1%
Other values (286)1587
74.7%
ValueCountFrequency (%)
1171
8.0%
285
4.0%
361
 
2.9%
446
 
2.2%
524
 
1.1%
623
 
1.1%
715
 
0.7%
810
 
0.5%
933
 
1.6%
1029
 
1.4%
ValueCountFrequency (%)
5884
0.2%
5862
 
0.1%
5131
 
< 0.1%
5001
 
< 0.1%
4881
 
< 0.1%
4761
 
< 0.1%
4621
 
< 0.1%
4491
 
< 0.1%
4406
0.3%
4328
0.4%

last_review
Categorical

HIGH CARDINALITY

Distinct308
Distinct (%)14.5%
Missing0
Missing (%)0.0%
Memory size16.7 KiB
2020-08-31
 
37
2020-11-22
 
28
2020-11-01
 
28
2020-05-31
 
27
2020-10-01
 
26
Other values (303)
1979 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters21250
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42 ?
Unique (%)2.0%

Sample

1st row2020-04-24
2nd row2020-04-01
3rd row2020-04-02
4th row2020-04-10
5th row2020-04-02

Common Values

ValueCountFrequency (%)
2020-08-3137
 
1.7%
2020-11-2228
 
1.3%
2020-11-0128
 
1.3%
2020-05-3127
 
1.3%
2020-10-0126
 
1.2%
2020-10-1226
 
1.2%
2020-11-2125
 
1.2%
2020-10-2525
 
1.2%
2020-04-0124
 
1.1%
2021-01-3122
 
1.0%
Other values (298)1857
87.4%

Length

2021-06-22T18:04:06.598745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-08-3137
 
1.7%
2020-11-2228
 
1.3%
2020-11-0128
 
1.3%
2020-05-3127
 
1.3%
2020-10-0126
 
1.2%
2020-10-1226
 
1.2%
2020-11-2125
 
1.2%
2020-10-2525
 
1.2%
2020-04-0124
 
1.1%
2021-01-3122
 
1.0%
Other values (298)1857
87.4%

Most occurring characters

ValueCountFrequency (%)
06197
29.2%
25435
25.6%
-4250
20.0%
12946
13.9%
3488
 
2.3%
5396
 
1.9%
9375
 
1.8%
4354
 
1.7%
8329
 
1.5%
6265
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number17000
80.0%
Dash Punctuation4250
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06197
36.5%
25435
32.0%
12946
17.3%
3488
 
2.9%
5396
 
2.3%
9375
 
2.2%
4354
 
2.1%
8329
 
1.9%
6265
 
1.6%
7215
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
-4250
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common21250
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06197
29.2%
25435
25.6%
-4250
20.0%
12946
13.9%
3488
 
2.3%
5396
 
1.9%
9375
 
1.8%
4354
 
1.7%
8329
 
1.5%
6265
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII21250
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06197
29.2%
25435
25.6%
-4250
20.0%
12946
13.9%
3488
 
2.3%
5396
 
1.9%
9375
 
1.8%
4354
 
1.7%
8329
 
1.5%
6265
 
1.2%

reviews_per_month
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct612
Distinct (%)28.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.325327059
Minimum0.06
Maximum10.86
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:07.002183image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.06
5-th percentile0.202
Q10.66
median1.89
Q33.51
95-th percentile6.238
Maximum10.86
Range10.8
Interquartile range (IQR)2.85

Descriptive statistics

Standard deviation1.967551977
Coefficient of variation (CV)0.8461398878
Kurtosis1.435349074
Mean2.325327059
Median Absolute Deviation (MAD)1.34
Skewness1.182614222
Sum4941.32
Variance3.87126078
MonotonicityNot monotonic
2021-06-22T18:04:07.486115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
147
 
2.2%
0.221
 
1.0%
0.3318
 
0.8%
0.2516
 
0.8%
0.3715
 
0.7%
215
 
0.7%
0.4315
 
0.7%
0.1714
 
0.7%
0.1913
 
0.6%
0.4813
 
0.6%
Other values (602)1938
91.2%
ValueCountFrequency (%)
0.061
 
< 0.1%
0.072
 
0.1%
0.081
 
< 0.1%
0.14
 
0.2%
0.115
0.2%
0.127
0.3%
0.134
 
0.2%
0.147
0.3%
0.157
0.3%
0.1610
0.5%
ValueCountFrequency (%)
10.861
< 0.1%
10.451
< 0.1%
10.382
0.1%
10.221
< 0.1%
10.211
< 0.1%
10.151
< 0.1%
10.141
< 0.1%
10.131
< 0.1%
101
< 0.1%
9.981
< 0.1%

calculated_host_listings_count
Real number (ℝ≥0)

HIGH CORRELATION

Distinct25
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.265411765
Minimum1
Maximum41
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:07.899325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q35
95-th percentile17
Maximum41
Range40
Interquartile range (IQR)4

Descriptive statistics

Standard deviation5.976582825
Coefficient of variation (CV)1.135064662
Kurtosis6.026209649
Mean5.265411765
Median Absolute Deviation (MAD)2
Skewness2.213838707
Sum11189
Variance35.71954226
MonotonicityNot monotonic
2021-06-22T18:04:08.329782image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
1600
28.2%
3366
17.2%
2256
12.0%
4233
 
11.0%
5139
 
6.5%
1782
 
3.9%
1378
 
3.7%
659
 
2.8%
953
 
2.5%
1545
 
2.1%
Other values (15)214
 
10.1%
ValueCountFrequency (%)
1600
28.2%
2256
12.0%
3366
17.2%
4233
 
11.0%
5139
 
6.5%
659
 
2.8%
726
 
1.2%
827
 
1.3%
953
 
2.5%
1012
 
0.6%
ValueCountFrequency (%)
416
 
0.3%
361
 
< 0.1%
353
 
0.1%
333
 
0.1%
298
 
0.4%
2326
 
1.2%
2214
 
0.7%
1813
 
0.6%
1782
3.9%
1626
 
1.2%

availability_365
Real number (ℝ≥0)

ZEROS

Distinct354
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean173.1416471
Minimum0
Maximum365
Zeros228
Zeros (%)10.7%
Negative0
Negative (%)0.0%
Memory size16.7 KiB
2021-06-22T18:04:08.782960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q156
median159
Q3300
95-th percentile365
Maximum365
Range365
Interquartile range (IQR)244

Descriptive statistics

Standard deviation129.5564728
Coefficient of variation (CV)0.7482686864
Kurtosis-1.408486453
Mean173.1416471
Median Absolute Deviation (MAD)123
Skewness0.1412735602
Sum367926
Variance16784.87964
MonotonicityNot monotonic
2021-06-22T18:04:09.189752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0228
 
10.7%
365133
 
6.3%
160
 
2.8%
18041
 
1.9%
36435
 
1.6%
17927
 
1.3%
36326
 
1.2%
36225
 
1.2%
9024
 
1.1%
14721
 
1.0%
Other values (344)1505
70.8%
ValueCountFrequency (%)
0228
10.7%
160
 
2.8%
210
 
0.5%
311
 
0.5%
43
 
0.1%
54
 
0.2%
67
 
0.3%
78
 
0.4%
82
 
0.1%
92
 
0.1%
ValueCountFrequency (%)
365133
6.3%
36435
 
1.6%
36326
 
1.2%
36225
 
1.2%
36112
 
0.6%
36015
 
0.7%
35915
 
0.7%
3586
 
0.3%
3575
 
0.2%
3566
 
0.3%

Interactions

2021-06-22T18:03:08.544820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:09.027200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:09.469655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:09.845485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:10.283063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:10.695032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:11.181359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:11.623411image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:12.093057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:12.549092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:13.004078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:13.442253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:13.920939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:14.353230image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:14.788544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:15.240670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:15.673382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:16.132106image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:16.551417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:17.078561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:17.549927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:17.922442image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:18.329151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:18.703454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:19.109317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:19.468281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:19.887358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:20.293085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:20.655420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:21.134111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:21.528874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:21.927706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:22.373453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:22.751362image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:23.143997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:23.516309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:23.920787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:24.337115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:26.284456image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:26.683905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:27.139402image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:27.517338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:27.903856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:28.345497image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:28.692663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:29.098309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:29.502429image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:29.871652image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:30.297474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:30.687584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:31.117731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:31.575589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:32.013696image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:32.496843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:32.896485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:33.337687image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:33.767683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:34.222569image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:34.654792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:35.111502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:35.588017image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:35.985352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:36.496634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:36.886672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:37.322624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:37.704075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:38.144590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:38.598916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:38.989450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:39.514343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:39.948207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:40.378684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:40.828232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:41.247920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:41.655592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:42.028813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:42.495980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:42.877473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:43.325949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:43.742385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:44.169986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:44.614579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:45.078456image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:45.555066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:45.973019image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:46.448743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:46.891726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:47.412470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:47.834119image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:48.350250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:48.807734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:49.273288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:49.711968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:50.122041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:50.557378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:51.365900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:51.794335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:52.266202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:52.702628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-06-22T18:03:53.183822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-06-22T18:04:10.176111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-22T18:04:10.880180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-22T18:04:11.570397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-22T18:04:12.256422image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-06-22T18:04:12.903129image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-06-22T18:03:53.954629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-22T18:03:54.998613image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

idnamehost_idhost_nameneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365
01193862Middle Room in Shared Apt229956AdamThe Port42.36494-71.10054Private room2321252020-04-241.4934
11193875Large Downstairs Room229956AdamThe Port42.36433-71.09911Private room3331642020-04-011.9837
21225831Victorian Charm MIT/Harvard/Kendall/Central-1BR3380576PaulThe Port42.36458-71.09845Entire home/apt15534292020-04-025.121310
31307195Harvard and MIT - Enjoy Comfort and Convenience!7106416KyleCambridgeport42.36392-71.10191Entire home/apt46924202020-04-106.11168
41984737Charming Harvard Victorian8824696SteveWest Cambridge42.38116-71.13326Entire home/apt42522932020-04-024.072306
52538983Spacious 2 bedrooms Apt-Roof deck NO Cleaning fee13000172DanCambridgeport42.35704-71.10909Entire home/apt35013692020-04-206.381231
63434822Harvard MIT: Artist's Home5871398BeverlyNorth Cambridge42.39403-71.13094Private room79145302020-04-160.422359
73774900City Oasis |Deck & Yard |Walk To Harvard MIT Train6823717Juan CarlosCambridgeport42.36079-71.11365Entire home/apt17022612020-04-063.83164
84956321Charming Third Floor Apartment25547444SusanAgassiz42.38412-71.11484Entire home/apt14541222020-04-061.911212
96185544Sunny 3 bed/2 bath-5 min to T- Harvard/MIT/Boston21745230JurekNorth Cambridge42.39633-71.13642Entire home/apt2501732020-04-151.266276

Last rows

idnamehost_idhost_nameneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365
211546820093Marvelous 4 bed 1 bath 1 free parking Harvard SQ373675137LiyaWest Cambridge42.37352-71.12260Entire home/apt150222021-02-140.8823155
211646903645Most popular 3 bed rooms in Harvard Sq373675137LiyaWest Cambridge42.37433-71.12411Entire home/apt151212021-03-081.0023230
211746938818Walk To Harvard379297950IsmayilStrawberry Hill42.37767-71.14900Entire home/apt119112021-01-010.351365
211847223523Entire private studio suite in Harvard/MIT45011296LanceMid-Cambridge42.36849-71.10537Entire home/apt295322021-02-251.36438
211947296747Central Square - Between Harvard & MIT74045635IanCambridgeport42.36333-71.10285Private room35232021-02-091.4110
212047538101SoloPrivate Space270651080OlanThe Port42.37122-71.09942Entire home/apt85252021-03-202.05252
212147703691ENTIRE APT: Bright Sun Drenched Spot in Cambridge!886226KibbeeWellington-Harrington42.37229-71.09891Entire home/apt85432021-03-233.0010
212248107460I2 Private Room by Kendall Sq218493228JohnWellington-Harrington42.36992-71.09612Private room46122021-03-092.003287
212348108594Private room next to MIT/Harvard 3247533528RoxyMid-Cambridge42.37185-71.10017Private room47112021-02-160.773235
212448157277Rice Street Studio374663992AmandaNorth Cambridge42.39563-71.12863Entire home/apt116222021-03-212.001178

Duplicate rows

Most frequently occurring

idnamehost_idhost_nameneighbourhoodlatitudelongituderoom_typepriceminimum_nightsnumber_of_reviewslast_reviewreviews_per_monthcalculated_host_listings_countavailability_365# duplicates
77774831Charming house in Cambridge C33W12576232ThomasThe Port42.36946-71.09899Private room4532112020-11-250.2153653
01330779Huron Village Lower Unit Harvard Sq4642626AmyNeighborhood Nine42.38905-71.12425Entire home/apt2007652020-05-310.72502
13434822Harvard MIT: Artist's Home5871398BeverlyNorth Cambridge42.39403-71.13094Private room84180302020-04-160.3723652
23434822Harvard MIT: Artist's Home5871398BeverlyNorth Cambridge42.39403-71.13094Private room84180302020-04-160.3823652
33434822Harvard MIT: Artist's Home5871398BeverlyNorth Cambridge42.39403-71.13094Private room84180302020-04-160.3923652
43610778Private Room with Bunk Bed near Harvard/MIT4297079Charlton & TheresaEast Cambridge42.36903-71.08794Private room854282020-09-250.351912
53610778Private Room with Bunk Bed near Harvard/MIT4297079Charlton & TheresaEast Cambridge42.36903-71.08794Private room854282020-09-250.36102
67007117Room near Alewife Red Line T stop.21745230JurekNorth Cambridge42.39634-71.13683Private room601292020-04-250.4363652
814213983Great Cambridge studio, great Harvard sq location1473780BernardoRiverside42.36660-71.11397Entire home/apt1353112020-11-240.201882
914727253Charming House in Cambridge C36W12576232ThomasThe Port42.36931-71.09760Private room4532102020-11-200.1953652